If you're not up to speed on the sorting machine project, check out my introduction article. It explains the mad engineering challenge of building a Lego sorting machine using a neural network.
Jamie and I have been very busy since the start of 2019. We have a lot to show off in the coming weeks! We've been very busy rebuilding many of the proof-of-concept components of the sorting machine into functional components made of quality materials like not hot glue! We still have some more work to do on those, so you'll need to wait a week or three before seeing them.
We now have an improved LED box and camera mount:
As you can see, and it now features light diffusing plastic sheets. These prevent highlights from happening on smooth pieces. That could cause the neural network to think the piece has edges that it does not.
As you can (probably not) see, the box now contains twice as many LEDs as it did before. The diffusing sheets absorb some of the light and we didn't have quite enough of it before. So we doubled down and put a whole extra roll in there. The camera now takes much clearer/sharper pictures than before, and we pickup dark pieces more reliably too.
It's not all rainbows and unicorns though. The taxidermist (the program that captures images, detects the parts, crops the image, and sends it to the sorting server) had been written by Jamie and I at the very start of the project, and it was a mess. We hacked together a patchwork assortment of code fragments from various tutorials and it accidentally did the job as long as you didn't look at it sideways or breathe in it's general direction.
It could never work if more than one part at a time were in the frame, and if one part entered the frame as another was leaving, the taxidermist would miss the first part entirely.
And for some unknown reason it always took a blank photo of the conveyor belt when it started up.
Starting Over
Obviously we needed something better. I sat down and tore the taxidermist to pieces. I think I re-used about 5% of the code from the old one. The new taxidermist is cleaner, faster, harder, better, stronger.
I wrote an entirely new algorithm for detecting parts. It's not fully optimized yet, but it already handles parts much more predictably and isn't fooled by multiple parts passing by at a time. It relies heavily on opencv; a very powerful image and video processing library for C++ and python.
The core idea is simple: We look at each frame of the video and count the number of distinct shapes present. If the number is not zero, then obviously there is a part in front of the camera, and we should tell the server about it.
There is significant problem however: Since a part is in front of the camera for many consecutive frames, how do we know if the part in the frame now is the same as the part from the last frame? How do we know if it's an entirely new one? You and I can easily tell just by looking at it, but computers generally suck at abstract concepts compared to humans.
Moving Pictures
I managed to make it work after a few days of tinkering and I'm quite happy with the result.
Below is a video of it running in real time. The timer on the bottom left is tracking how long it takes to process each frame in milliseconds. My target frame rate is 30/second. That translates to approximately 0.032 seconds to process each frame and we are averaging about 0.020, so it's plenty fast already.
Also, notice that it waits until a part is fully in view before tracking it, and removes it from the tracker just before the bottom of the frame. It struggles a lot with dark pieces, as you'll see in the video. Fixing that involves tweaking settings in a few different places. It's a bit fiddly, and I'll work on it this weekend.
So um..... how does it work?
The answer is not simple, and requires two separate tasks to be performed. The first task is to determine if there are any parts at all in the image from the camera, and then get some useful data about them. I call this part of the program the detective. The second task is to track that data from one frame to the next. I call that part the tracker.
Let's start with the detective.
Elementary, My Dear Watson.
The detective was actually pretty simple to write. Opencv has a plethora of useful tools for this job. I used the MOG2 background subtractor to get black and white images where the part is highlighted in white, and the belt is all black. I then call several contours related functions which find bounding boxes for all the shapes in the image.
The result is a greyscale image like the one below, and a bunch of data representing bounding boxes for each white blob. We can superimpose the bounding box data from the greyscale image on to the original image to see it working:
The blue rectangles are returned by opencv as two pixel coordinates, top left corner and bottom right corner. I calculate the center point of that box, create a unique id for it, and then store all of that data in an object called a part_parameters, which looks like this:
# Ad-blockers sometimes ruin the formatting of this text.
part_parameters:
index = 17 # this is the unique part number. It goes up by 1 for each part.
max_x = 685 # top left corner X coordinate
max_y = 379 # top left corner y coordinate
min_x = 561 # bottom right corner x coordinate
min_y = 229 # bottom right corner y coordinate
center_x = 623 # center x coord
center_y = 304 # center y coord
Each part in the image above gets one of those objects associated with it. The detective then passes the list of those objects to the tracker so it can figure out what to do with them.
Tracking is Tricky
The tracker consists of two lists simply named "old" and "new". The new list contains every part_parameter object that was found in the current frame, and the old list contains every part_parameter object that was seen in the previous frame.
Starting from the bottom-most shape, the tracker checks the center_x and center_y coordinates of each part in the new list. The new parts with the closest match are "mapped" to the old parts.
The mapping is simple: we change all of the x and y coordinates from the old part_parameters to the new ones. This means that when the next frame comes around, the old list will be filled with all of the x/y coordinates from this frame.
Of course this is limited by the center_y coordinate; a new part can never match to an old part that is farther down the screen. The parts never move backwards so the new part MUST be farther down than the old one.
This constraint is how we know when parts have come and gone from the camera. When a part enters the frame, there will be no matches on the old list higher up on the screen. This is also how we know when a part leaves the frame; if a part from the old list did not get a new part mapped to it this frame, it must be gone so we remove it from the list.
Once we have mapped all of the new parts to the old, and cleaned up the parts that have gone, we send back the part_parameter objects to the detective. It can use those min_x/y and max_x/y coordinates to crop pictures out of the frame and send them off to the sorting machine.
That's it for now folks. Jamie will be checking in sometime in the next few days with pictures and video footage of our new parts feeder. Building that was a harrowing affair and we'd not have survived without his wife patiently supplying us with caffeine and casserole after 11pm while we repeatedly insist that it we'll wrap it up "soon".